Improving over-fitting in ensemble regression by imprecise probabilities

نویسندگان

  • Lev V. Utkin
  • Andrea Wiencierz
چکیده

In this paper, generalized versions of two ensemble methods for regression based on variants of the original AdaBoost algorithm are proposed. The generalization of these regression methods consists in restricting the unit simplex for the weights of the instances to a smaller set of weighting probabilities. Various imprecise statistical models can be used to obtain a restricted set of weighting probabilities, whose sizes each depend on a single parameter. For particular choices of this parameter, the proposed algorithms reduce to standard AdaBoost-based regression algorithms or to standard regression. The main advantage of the proposed algorithms compared to the basic AdaBoost-based regression methods is that they have less tendency to over-fitting, because the weights of the hard instances are restricted. Several simulations and applications furthermore indicate a better performance of the proposed regression methods in comparison with the corresponding standard regression methods. 2015 Elsevier Inc. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An imprecise boosting-like approach to regression

This paper is about a generalization of ensemble methods for regression which are based on variants of the basic AdaBoost algorithm. The generalization of these regression methods consists in restricting the unit simplex for the weights of the instances to a smaller set of weighting probabilities. The proposed algorithms cover the standard AdaBoost-based regression algorithms and standard regre...

متن کامل

Generating Probabilities From Numerical Weather Forecasts by Logistic Regression

Logistic models are studied as a tool to convert output from numerical weather forecasting systems (deterministic and ensemble) into probability forecasts for binary events. A logistic model obtains by putting the logarithmic odds ratio equal to a linear combination of the inputs. As any statistical model, logistic models will suffer from over-fitting if the number of inputs is comparable to th...

متن کامل

Estimation of Seasonal Precipitation Tercile-Based Categorical Probabilities from Ensembles

Ensemble simulations and forecasts provide probabilistic information about the inherently uncertain climate system. Counting the number of ensemble members in a category is a simple nonparametric method of using an ensemble to assign categorical probabilities. Parametric methods of assigning quantile-based categorical probabilities include distribution fitting and generalized linear regression....

متن کامل

MASTER THESIS by Paul Fink Ensemble methods for classification trees under imprecise probabilities

In this master thesis some properties of bags of imprecise classification trees, as introduced in Abellán and Masegosa (2010), are analysed. In the beginning the statistical background of imprecise classification trees is outlined – starting with an overview on measuring uncertainty within the concept of Dempster–Shafer theory is presented, followed by a discussion of its application in a tree–...

متن کامل

برازش توابع انتقالی خاک با استفاده از رگرسیون فازی

Pedotransfer functions are the predictive models of a certain soil property from other easily, routinely, or cheaply measured properties. The common approach for fitting the pedotransfer functions is the use of the conventional statistical regression method. Such an approach is heavily based on the crisp obervations and also the crisp relations among variables. In the modeling natural systems, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 317  شماره 

صفحات  -

تاریخ انتشار 2015